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Following the development of artificial intelligence technology, a new trend 
has emerged in which this technology is increasingly used in case 
investigations. In this study, we developed a lie detection system that can 
instantly determine whether an interrogee is lying depending on their 
emotional responses to specific questions. Investigators then use these data, 
in addition to their personal experiences and case information, to adjust their 
interrogation strategies and techniques, thereby leading the interrogee to 
confess and accelerating the investigation process. Our system collects data 
using OpenFace and performs deep learning using gcForest. Deep learning 
training was performed using a real-life trial dataset, the Miami University 
Deception Detection Database, and a bag-of-lies dataset, and their 
corresponding trained systems achieved a detection accuracy of 95.11%, 
90.83%, and 88.19%, respectively. 
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1. INTRODUCTION 

During case investigations, officers may request criminal suspects to appear for an interrogation in 
order to investigate a crime or gather evidence. During these interrogations, officers are required to maintain 
an honest demeanor and are prohibited from using improper techniques, such as violence, coercion, bribery, 
fraud, or exhaustion. The purpose of these interrogations is to lead the interrogee into telling the truth and to 
glean information from their statements in order to further clarify the case or uncover new clues. Numerous 
cases are solved as a result of interrogee statements, indicating that interrogations are a crucial means for 
investigators to gather criminal facts. However, even if the interrogee is not the actual perpetrator, they may 
occasionally lie for a variety of reasons. In addition, to avoid punishment, actual criminals tend to deny or 
distort criminal facts. Therefore, if officers are unable to determine the veracity of a suspect’s statements or 
decipher their intentions, then the investigation may become stalled or may proceed in the wrong direction, 
thereby wasting time and resources and allowing criminals to go unpunished. Hence, investigators must 
employ effective interrogation techniques while adhering to the law. 

As a result of the development of hardware equipment and artificial intelligence (AI) in the area of 
image recognition, the traditional method of human supervision has evolved to include the integration of 
systems and information technologies to achieve smart image recognition. The computational and logical 
capabilities of various deep learning algorithms have also enabled the rapid processing of big data and the 
improvement of the learning and recognition efficiency [1]-[10]. Studies in this area have spanned a wide 
range of fields, including crime warning, license plate recognition, local disease recognition, and depression 
and dementia treatment [11]—[14]. Foreign law enforcement agencies are currently investigating the use of 
face recognition in interrogations in an effort to reduce prejudice against people of different races and 
genders [15]. 
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The primary purpose of a case investigation is to obtain criminal evidence and clarify case-related 
facts. In this scenario, the investigation process is conducted in an interactive fashion. In each case, physical 
evidence, documentary evidence, and related parties are interconnected. To solve a crime, the investigators 
must rely on existing clues and evidence. Thus, interrogation is one of the most crucial means for 
investigators to clarify facts and obtain additional case-related information during case investigations. 

Keli [16] examined the impact ratio (with vs. without impact) of the 17 reasons why 20 criminal 
suspects voluntarily confessed as shown in Table 1. Among these reasons, lies being exposed impacted the 
fifth largest number of suspects, indicating that if officers can immediately expose the interrogee’s lies during 
interrogation, the interrogee becomes more likely to confess. The concept of facial microexpressions was first 
proposed in 1966 [17], and it refers to the subconscious responses that people exhibit when stimulated. The 
majority of these expressions consist of subtle changes in facial features and muscles. These changes are 
difficult to conceal and are prevalent across all racial and age groups [18]-[20]. In recent years, facial 
microexpressions have been used in studies to identify lying interrogees. 


Table 1. Reasons why criminal suspects voluntarily confessed 


or Impact Ratio 

No. Reasons Yes No 
1 The investigators have gathered all the criminal evidence, so resistance is futile. 17 3 
2 Everybody involved has told the truth, so I did too. 12 8 
3 To receive lighter punishment. 18 2 
4 Ihave to be responsible for what I have done, after all. 9 11 
5 My lies have been exposed. 15 5 
6 It feels better to tell the truth than not. 17 3 
7 To thank the investigators for their help. 7 13 
8 To leave the interrogation room early. 16 4 
9 At the time, I felt pressured to tell the truth. 14 6 
10 I confessed to conceal other crimes, but I only told part of the truth. 12 8 
11 It is only a matter of time before everything is exposed, so it is better to tell the truth early. 10 10 
12 To show the interrogators that I was willing to cooperate. 15 5 
13 I do not know why I confessed; I did it on impulse. 4 16 
14 The facts will be found eventually, so I might as well tell the truth. 14 6 
15 The investigators were very nice to me. 6 14 
16 To be released on bail earlier. 14 6 
17 The case-handling departments actually did not have sufficient criminal evidence, and I voluntarily 7 13 


confessed under the assumption that they did. 


In recent years, many units have developed vastly divergent perspectives on the use of polygraphs. 
Polygraphs are no longer used by law enforcement units as evidence in trials. Instead, they are used as 
interrogation aids. During an interrogation, the interrogee may lie or conceal the truth for a variety of reasons. 
Therefore, in this study, we developed a lie recognition system that uses AI deep learning to replace 
traditional lie detection techniques with a noncontact-based lie detection technique so that investigations are 
not delayed if the interrogee refuses to take the polygraph or is required to take the polygraph at another time. 
During interrogations, our system can objectively determine the emotional state and truthfulness of the 
interrogee. The results can then be combined with case information and the investigators’ own experiences to 
modify the direction of the interrogation, thereby accelerating the investigation process. 

The rest of this paper is organized as follows. Section 2 introduces our study methods, including 
how we collected the data and extracted useful information for subsequent analyses, section 3 presents our 
results and discussion, and section 4 concludes the study. 


2. METHODS 

Because truth is essential in decision-making, detecting misleading information before such 
information is included in the decision-making process is crucial. To screen and clarify a large amount of 
case-related information within a short period of time, lowering the detection thresholds to avoid missing 
critical information is essential. Several lie detection methods are currently available, depending on the case, 
environment, and purpose. However, the most common lie detection method is polygraphs. Polygraphs rely 
on contact sensors and the analysis of physiological changes, such as heart rate fluctuations [21], to detect 
misleading behavior. After the emergence of the facial action coding system (FACS) and AI deep learning, 
changes have been observed in the traditional method for lie detection. 

Deep neural networks (DNNs) have made considerable progress in image and sound processing. In 
addition, a number of deep learning techniques, such as DNNs, convolutional neural networks, deep belief 
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networks, and loop neural networks, have been applied to computer vision, speech recognition, natural 
language processing, audio recognition, and bioinformatics, with remarkable results. 

Our proposed system captures facial action unit (AU) signals by using OpenFace [22]-[24], 
performs unsupervised learning by using gcForest, and determines whether an interrogee is likely to confess 


if their lies are exposed. The detection process of the proposed system is depicted in Figure 1. The system 
immediately performs detection and makes decisions during interrogations. 


Confession possibility 


f Weights 
Video 30 frames per second 


== Real-time AI network computing 
Perform micro-expression 


capture and AU encoding 


Figure 1. Detection process of the proposed system 


2.1. Facial microexpressions 

By using the human face anatomy and matching facial expressions to their meanings, the FACS 
defines 44 facial AUs. These AUs are used to describe local facial muscle movements and objectively, 
accurately, and precisely describe facial expressions [25]. By using AI face recognition and model learning 
technologies, AU units can be marked quickly in real-time, thereby allowing the FACS to improve its 
recognition capability and helping interrogators identify lying interrogees and expose their lies to increase the 
likelihood of confessing. 


2.2. Deep learning 

Deep learning is a branch of machine learning and is an algorithm based on artificial neural 
networks (ANNs) that learns data features. In the 1980s, a number of essential concepts related to 
associationism emerged, including distributed representation [26]. Distributed representation eliminates the 
need for users to manually extract features and enables computers to simultaneously learn how to extract and 
use features. Feature learning aims to discover superior representation methods, construct more robust 
models, and uncover representation methods from large amounts of unlabeled data. These representation 
methods are derived from neuroscience and are loosely based on the understanding of information processing 
and communication models similar to those of the nervous systems. For example, neural encoding attempts 
to define the relationships between pulling neuron responses and neuronal activities in the brain. Figure 2 
shows the differences between a shallow neural network and a deep learning neural network model [27]. The 
shallow neural network has only one hidden layer, whereas the DNN has two or more hidden layers. 


Simple Neural Network Deep Learning Neural Network 


@input Layer @ Hidden Layer € Output Layer 


Figure 2. The difference between simple and deep learning neural networks 
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2.3. gcForest 

Introduced by Zhou and Feng [28], multigrained cascade forest (gcForest) uses a cascading method 
to stack multilayer random forests in order to improve feature representation and learning performance. 
gcForest undergoes representation learning through random cascade forests, processes data hierarchically 
similar to deep learning networks (DNNs), and uses different forest types to create learning diversity and 
form waterfall-like structures. The model also uses sliding windows and multigrained scanning to preprocess 
input features, and it inputs the extracted feature vectors into cascade forests to train and splice the model 
repeatedly until the verification results converge. Compared with DNNs, gcForest requires considerably less 
training data to achieve satisfactory performance. In addition, because gcForest contains fewer 
hyperparameters, does not require the hyperparameter settings to be adjusted, and can control tree-like 
components through self-adaptation, it consumes only a few computational resources and samples, making 
its training relatively straightforward [29]. 


2.4. Training models 

In this study, three video datasets were used for AI deep learning training: a real-life trial dataset 
[30], the Miami University Deception Detection Database [31], and a bag-of-lies (BOL) dataset [32]. The 
risk levels in the training videos were divided to reflect the liabilities and risks that the interrogees faced 
while lying. A high risk indicated that if the interrogee lied, they would face actual repercussions, such as 
criminal charges and imprisonment: i) the real-life trial dataset included high-risk videos of actual court 
proceedings; ii) the Miami University Deception Detection Database and BOL dataset included experimental 
laboratory-produced low-risk videos. 


2.5. Effectiveness assessment 
Confusion matrices are an essential instrument for evaluating the performance of classification 
models. They generate indicators (i.e., accuracy, precision, recall, and Fl-score) that can be used to evaluate 
system detection results. 
a. Confusion matrix is as follows. 
- True positive (TP): Lied and tested positive for lying 
- False positive (FP): Did not lie but tested positive for lying 
- True negative (TN): Did not lie and tested negative for lying 
- False negative (FN): Lied but tested negative for lying 
b. Accuracy: The overall prediction accuracy is calculated as follows. 


TP +TN 
TP+TN+FP+FN 


Accuracy = 


c. Precision: Precision is the percentage of samples that are truly positive out of all samples predicted to be 
positive. It is calculated as follows. 


TP 
Precision = TP +FP 


d. Recall: Recall is the percentage of samples that are truly positive out of all positive samples. It is 
calculated as follows. 


TP 


Recall = TP +FN 


e. Fl-score: Precision and recall have an interdependent relationship. Optimally, they should both be high. 
However, in real-life scenarios, the higher one is, the lower the other one becomes. Therefore, precision 
and recall must be comprehensively assessed. To this end, the most common method is to use the 
Fl-score as a comprehensive indicator as follows. 


2PR 


1 
ee ee 
TR k= pa 


f. Area under the receiver operating characteristic curve (AUC): The AUC is a common statistical value that 
represents the predictive ability of a classifier. The greater the area under the curve is, the stronger the 
predictive ability becomes: 
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- An AUC value greater than 0.5 indicates that the classification effect of the classifier is superior to 
random guesses, indicating that the model provides valuable predictions. 

- An AUC value of 0.5 indicates that the classification effect of the classifier is equivalent to random 
guesses, indicating that the model does not provide valuable predictions. 


3. RESULTS AND DISCUSSION 

During an interrogation, the interrogee may lie for a variety of reasons, including the avoidance of 
consequences or the protection of others. However, even lies can provide investigative leads. In this study, 
we developed a facial microexpression recognition system to immediately detect and expose lies, thereby 
compelling suspects to confess and assist with interrogations. Figure 3 depicts the DNN structure of the 
proposed system. 


Deep learning network 


Take 30 frames as a block 


Figure 3. AI learning structure of the proposed lie detection system 


Before the proposed lie detection system was actually used, it was first trained. The videos in 
the datasets were divided into 50% lies and 50% truths at 30 frames per second. The samples were then 
divided into two groups as follows: 70% for training and 30% for testing. To achieve facial marking 
and detection, a constrained local neural field was used, which provided over 700 features, of which 35 
were associated with facial AUs. Because the p values of AUO1_r, AU23_r, and AU17_C were too low, 
they were disregarded during the training process to increase the detection success rate of the training 
model [33]. 

Before detection was performed, the proposed system was trained using the real-life trial dataset, 
Miami University Deception Detection Database, and BOL dataset. The real-life trial dataset included 
high-risk videos of actual court proceedings, whereas the other two datasets included experimental 
laboratory-produced low-risk videos. Table 2 lists the detection results of the proposed system. We compared 
the accuracy, precision, recall, and Fl-score of the proposed system trained using the three different datasets. 
The results indicated that the system produced the highest accuracy (95.11%) when trained using actual court 
videos and the lowest accuracy (88.19%) when trained using the BOL database. All other indicators 
exhibited scores over 80%. 


Table 2. Assessment indicators of the effectiveness of the proposed lie detection system 


Data Set Accuracy Precision Recall F1-Score 
TRIAL 95.11% 95.39% 93.97% 94.68% 
MU3D 90.83% 88.99% 81.27% 84.96% 

BOL 88.19% 88.64% 89.06% 88.85% 
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Figure 4 depicts the system effectiveness indicators as measured by the AUC. When the proposed 
system was trained using the real-life trial dataset, the Miami University Deception Detection Database, and 
the BOL dataset, AUC values of 95.03%, 88.29%, and 88.14%, respectively, were obtained. Among the three 
datasets, the proposed system demonstrated optimal detection performance for all indicators when trained 
using actual court videos. 
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Figure 4. AUC values of the proposed lie detection system 
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Table 3 compares the effectiveness of the proposed system to other court video evaluation systems. 
The proposed system achieved the highest scores across all four indicators, with an accuracy and Fl-score of 
95.11% and 94.68%, respectively, indicating its superior detection rate stability and reliability. 


Table 3. Comparison of system effectiveness indicators 


No Methods Accuracy Precision Recall F1-Score 
1 Feature Auto-Extraction+Fusion [34] 66.36% 43.57% 49.08% 43.43% 
2 Emotion Transformation Feature for Detection [35] 69.44% 40.98% 52.65% 45.25% 
3 Facial Affect Involved Method [36] 72.95% 41.47% 55.00% 46.49% 
4 Multimodal Detection [37] 76.29% 69.73% 63.64% 60.54% 
5 SVM/RF/NN+Fusion [38] 77.12% 58.51% 64.64% 59.26% 
6 Multi-modal Neural Mode [39] 77.92% 71.43% 71.58% 67.74% 
7 CNN+Fusion [40] 83.84% 74.94% 73.86% 73.00% 
8 Face Focused Cross-Stream Net-work [41] 85.19% 79.82% 7865% 75.99% 
9 GCFM [42] 88.14% 82.46% 80.75% 78.50% 
10 Our Method 95.11% 95.39% 93.97% 94.68% 


4. CONCLUSION 

Combining AI and image detection technologies has increased their applicability and development 
within the field of information engineering. Technology enables faster access to necessary information and 
has become an increasingly popular tool to enhance the effectiveness of investigations. In this study, we used 
existing imaging equipment in interrogation rooms and a noncontact lie detection method to expand the 
application of lie detection, determine the true emotions of interrogees without their knowledge, and provide 
investigators with objective detection results. Currently, the accuracy of our system exceeds 80%. To achieve 
the same accuracy in practice, additional related videos and videos of individuals of Asian descent must be 
added in future training scenarios to improve the feature extraction accuracy of our system. 
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